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IN THE CLAIMS 



Please amend the claims as follows: 

1. (Currently Amended) An apparatus comprising: 
a memory interface; 

a fest plurality of queues connected to the memory interface , including a first queue and 
a second queue, each of the first plurality of queues for holding a plurality of pending memory 
requests; 

one or more instruction-processing circuits, wherein each instruction-processing circuit is 
operatively coupled [[to]] through the plurality of queues to the memory interface and wherein 
each of the plurality of instruction-processing circuits that inserts one or more memory requests 
into at least one of the queues based on a first memory operation instruction, that specifies a 
m e mory op e ration, and that inserts a first synchronization marker into the first queue and inserts 
a second synchronization marker into the second queue based on a second synchronization 
operation instruction, that specifie s a synchronization op e ration, and that and inserts one or more 
memory requests into at least one of the queues based on a third second memory operation 

a first synchronization circuit, operatively coupled to the first plurality of queues, that 
selectively halts processing of further memory requests from the first queue based on the first 
synchronization marker reaching a predetermined point in the first queue until the corresponding 
second synchronization marker reaches a predetermined point in the second queue[[.]]; 

wherein each of the memory requests is a memory reference, wherein the memory 
reference is generated as a result of execution of instructions by the instruction-processing 
circuits. 



2. (Currently Amended) The apparatus of claim 1, wherein the plurality of queues is wi 
a proceo s or, and wherein the first queue is used for only synchronization markers and vector 




memory references, r e qu e sts and synchronizations ^,]] i 
synchronization markers and scalar memory references. 
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3. (Currently Amended) The apparatus of claim 2, wherein the first synchronization 
operation instruction is an Lsync-type instruction. 

4. (Currently Amended) The apparatus of claim 2, wherein the first synchronization 
operation instruction is an Lsync V,S-type instruction. 

5. (Currently Amended) The apparatus of claim 2, wherein, for a second synchronization 
operation instruction, a corresponding synchronization marker is inserted to in only the first 
queue. 

6. (Original) The apparatus of claim 5, wherein the second synchronization instruction is an 
Lsync-type instruction. 

7-8. (Canceled) 

9. (Currently Amended) The apparatus of claim [[1]] 2, further comprising [[:]] wherein the 
first queue includes two subqueues, including a first subqueue and a second subqueue, wherein 
the first subqueue is for holding the vector memory references and synchronization markers 
associated with the vector memory references and wherein the second subqueue is a fifth queue 
for holding a plurality of write store data elements and synchronization markers associated with 
the store data elements , wherein each write store data element in the second subqueue 
corresponds to a one of the memory r e quest requests in the first qu e ue subqueue , and wherein the 
write store data elements are loaded into the fifth qu e u e second subqueue decoupled from the 
loading of the memory requests into the first queue subqueue . 

10. (Currently Amended) The apparatus of claim [[1]] 4, further comprising [[:]] wherein the 
instruction-processing circuits include a data cache[[;]] and an external cache [[,]] and wherein 
first synchronization instruction is an the Lsync V,S type instruction^,]] prevents preventing 
subsequent scalar references from accessing the data cache until all vector references have been 
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sent to fee an external cache and all vector writes have caused any necessary invalidations of the 
data cache. 

11. (Currently Amended) A method comprising: 
providing a memory interface; 

providing a first plurality of queues connected to the memory interface , including a first 
queue and a second queue, each of the first plurality of queues for holding a plurality of pending 
memory requests; 

providing one or more instruction-processing circuits, wherein each instruction- 
processing circuit is operatively coupled through the plurality of queues to the memory interface; 

inserting one or more memory requests into at least one of the queues based on a first 
memory operation instruction that specifi e s a memory op e ration executed in one of the 
instruction-processing circuits ; 

inserting bas e d on a s e cond instruction that specifies a synchronization operation 
inserting a first synchronization marker into the first queue and inserting a second 
synchronization marker into the second queue based on a synchronization operation instruction 
executed in one of the instruction-processing circuits ; and 

inserting one or more memory requests into at least one of the queues based on a third 
second memory operation instruction that specifies a memory operation executed in one of the 
instruction-processing circuits ; 

processing memory requests from the first queue; and 

selectively halting further processing of memory requests from the first queue based on 
the first synchronization marker reaching a predetermined point in the first queue until the 
corresponding second synchronization marker reaches a predetermined point in the second 
queue[[.]]i 

wherein each of the memory requests is a memory reference. 



12. (Currently Amended) The method of claim 1 1 , wherein the plurality of queu e s is withi 

a first proc e ssor, and wherein the ins e rting to the first queue is-fer- stores only synchronization 




marks and vector memory references, requests 




*[[,]] and the inserting to 
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wherein the second queue i s for stores only synchronization marks and scalar memory 



1 3 . (Currently Amended) The method of claim 12, wherein the first synchronization 
operation instruction is an Lsync-type instruction. 

14. (Currently Amended) The method of claim 12, wherein the fifst synchronization 
operation instruction is an Lsync V,S-type instruction. 

15-18. (Canceled) 

19. (Currently Amended) The method of claim [[11]] 12, further comprising [[:]] wherein the 
first queue includes two subqueues, including a first subqueue and a second subqueue, wherein 
the first subqueue stores the vector memory references and synchronization markers associated 
with the vector memory references and wherein the second subqueue stores queueing a plurality 
of write store data elements to a fifth qu e u e and synchronization markers associated with the 
store data elements , wherein each write store data element in the second subqueue corresponds to 
[[a]] one of the memory request requests in the first queue subqueue , and wherein the write store 
data elements are inserted into the fifth queue second subqueue decoupled from the inserting of 
the memory requests into the first queue subqueue . 

20. (Currently Amended) The method of claim [[11]] 14, further comprioing [[:]] wherein 
providing the instruction-processing circuits includes providing a data cache and an e xternal 
cache [[,]] wherein first synchronization instruction is an performing the Lsync V,S type 
instruction^; and]] includes preventing subsequent scalar references from accessing the data 
cache until all vector references have been sent to [[the]] an external cache and all vector writes 
have caused any necessary invalidations of the data cache based on the first synchronization 
instruction . 



references. 




»[[•]] 
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21. (Currently Amended) An apparatus comprising: 
a memory interface; 

a first plurality of queues connected to the memory interface , including a first queue and 
a second queue, each of the first plurality of queues for holding a plurality of pending memory 
requests; 

one or more instruction-processing circuits, wherein each instruction-processing circuit is 
operatively coupled through the plurality of queues to the memory interface and wherein each of 
the plurality of instruction-processing circuits includes: 

means for inserting one or more memory requests into at least one of the queues 
based on a first memory operation instruction that specifies a memory op e ration executed 
in one of the instruction-processing circuits ; 

means for[[,]] based on a s e cond instruction that specifies a synchronization 
operation inserting a first synchronization marker into the first queue and inserting a 
second synchronization marker into the second queue based on a synchronization 
operation instruction executed in one of the instruction-processing circuits ; and 

means for inserting one or more memory requests into at least one of the queues 
based on a third second memory operation instruction that specifies a memory operation 
executed in one of the instruction-processing circuits ; 
means for processing memory requests from the first queue; and 

means for selectively halting further processing of memory requests from the first queue 
based on the first synchronization marker reaching a predetermined point in the first queue until 
the corresponding second synchronization marker reaches a predetermined point in the second 
queue[[.]]; 

wherein each of the memory requests is a memory reference. 

22. (Currently Amended) The apparatus of claim 21, wherein the plurality of queues is 
within a first processor, and wherein the means for inserting to the first queue operates for only 
vector memory requests and synchronizations, and [[the]] means for inserting to the second 
queue operates for only scalar memory requests and synchronizations. 
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23. (Currently Amended) The apparatus of claim 22, wherein the [[first]] synchronization 
operation instruction is an Lsync-type instruction. 



24. (New) A system comprising: 

a plurality of processors, including a first processor and a second processor, wherein each 
of the processors includes: 

a memory interface; 

a plurality of Lsync queues connected to the memory interface, including a first 
Lsync queue and a second Lsync queue, each of the plurality of Lsync queues for holding 
a plurality of pending memory requests; 

one or more instruction-processing circuits, wherein each instruction-processing 
circuit is operatively coupled through the plurality of Lsync queues to the memory 
interface and wherein each of the plurality of instruction-processing circuits inserts one or 
more memory requests into at least one of the Lsync queues based on a first memory 
operation instruction, inserts a first Lsync synchronization marker into the first Lsync 
queue and inserts a second Lsync synchronization marker into the second Lsync queue 
based on a synchronization operation instruction, and inserts one or more memory 
requests into at least one of the Lsync queues based on a second memory operation 
instruction; and 

a Lsync synchronization circuit, operatively coupled to the plurality of Lsync 
queues, that selectively halts processing of further memory requests from the first Lsync 
queue based on the first Lsync synchronization marker reaching a predetermined point in 
the first Lsync queue until the corresponding second Lsync synchronization marker 
reaches a predetermined point in the second Lsync queue; and 
one or more Msync circuits, wherein each of the Msync circuits is connected to the 
plurality of processors and wherein each of the Msync circuits includes: 

a plurality of Msync queues, including a first Msync queue and a second Msync 
queue, each of the plurality of Msync queues for holding a plurality of pending memory 
requests received from the Lsync queues, wherein the first Msync queue stores only 
Msync synchronization markers and memory requests from the first processor, and the 
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second Msync queue stores only Msync synchronization markers and memory requests 
from the second processor; and 

an Msync synchronization circuit, operatively coupled to the plurality of Msync 
queues, that selectively halts further processing of the memory requests from the first 
Msync queue based on an Msync synchronization marker reaching a predetermined point 
in the first Msync queue until a corresponding Msync synchronization marker from the 
second processor reaches a predetermined point in the second Msync queue; 
wherein each of the memory requests is a memory reference, wherein the memory 

reference is generated as a result of execution of instructions by instruction-processing circuits in 

each processor. 

25. (New) The system of claim 24, wherein the Msync synchronization circuit includes a 
plurality of stall lines, wherein each of the stall lines is connected to one of the plurality of 
Msync queues and wherein each of the stall lines is for halting further processing of the memory 
requests from a corresponding Msync queue. 

26. (New) The system of claim 24, wherein each processor includes a data cache and 
wherein each Msync synchronization circuit includes an external cache, wherein the data cache 
and the external cache are used to perform an Lsync V,S type instruction, wherein the Lsync V,S 
type instruction prevents subsequent scalar references from accessing the data cache until all 
vector references have been sent to the external cache in a corresponding Msync synchronization 
circuit and all vector writes have caused any necessary invalidations of the data cache. 

27. (New) A method comprising: 

providing a plurality of processors, including a first processor and a second processor, 
wherein each of the processors includes a memory interface, a plurality of Lsync queues 
connected to the memory interface, including a first Lsync queue and a second Lsync queue, and 
one or more instruction-processing circuits, each of the instruction-processing circuits 
operatively coupled through the plurality of Lsync queues to the memory interface; 
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providing one or more Msync circuits, wherein each of the Msync circuits is connected to 
the plurality of processors and wherein each of the Msync circuits includes a plurality of Msync 
queues, including a first Msync queue and a second Msync queue, each of the plurality of Msync 
queues operatively coupled to the plurality of Lsync queues in one of the plurality of processors; 

inserting one or more memory requests into at least one of the Lsync queues based on a 
first memory operation instruction executed in one of the instruction-processing circuits; 

inserting a first Lsync synchronization marker into the first Lsync queue and inserting a 
second Lsync synchronization marker into the second Lsync queue based on a synchronization 
operation instruction executed in one of the instruction-processing circuits; 

inserting one or more memory requests into at least one of the Lsync queues based on a 
second memory operation instruction executed in one of the instruction-processing circuits; 

processing memory requests from the first Lsync queue; 

selectively halting further processing of memory requests from the first Lsync queue 
based on the first Lsync synchronization marker reaching a predetermined point in the first 
Lsync queue until the corresponding second Lsync synchronization marker reaches a 
predetermined point in the second Lsync queue; 

inserting Msync synchronization markers and memory requests received from the Lsync 
queues in the first processor into the first Msync queue; 

inserting Msync synchronization markers and memory requests received from the Lsync 
queues in the second processor into the second Msync queue; and 

selectively halting further processing of the memory requests from the first Msync queue 
based on an Msync synchronization marker reaching a predetermined point in the first Msync 
queue until a corresponding Msync synchronization marker from the second processor reaches a 
predetermined point in the second Msync queue; 

wherein each of the memory requests is a memory reference. 



28. (New) The method of claim 27, wherein selectively halting further processing of the 
memory requests from the first Msync queue includes sending a stall signal to the Msync queues. 
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29. (New) The method of claim 27, wherein selectively halting further processing of 
memory requests from the first Lsync queue includes performing an Lsync V,S type instruction, 
wherein performing the Lsync V,S type instruction includes preventing subsequent scalar 
references from accessing a data cache in the processor until all vector references have been sent 
to an external cache in a corresponding Msync synchronization circuit and all vector writes have 
caused any necessary invalidations of the data cache. 



